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Agenda 

• We want more fluid in games © 

• Eulerian (grid based) fluid. 

• Sparse Eulerian Fluid. 

• Feature Level 11.3 Enhancements! 


(Not a talk on fluid dynamics) 
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Why Do We Need Fluid in Games? 






Replace particle kinematics! 

. more realistic == better immersion 


Game mechanics? 

• occlusion 

•smoke grenades 
•interaction 

• Dispersion 

•air ventilation systems 
•poison, smoke 



Endless opportunities! 
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Eulerian Simulation #1 


My (simple) DX11.0 eulerian fluid simulation: 




2x Velocity 


2x Pressure 


lx Vorticity 
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Eulerian 


Inject 


i 


Advect 


l 


Pressure 


i 


Vorticity 


i 


Simulation #2 

> Add fluid to simulation 

> Move data at, XYZ ^ (XYZ+Velocity) 

> Calculate localized pressure 

> Calculates localized rotational flow 


Evolve 


> Tick Simulation 
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**fsome imagination required)** 
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Too Many Volumes Spoil the... 


• Fluid isn't box shaped. 

• clipping 

• wastage 

• Simulated separately. 

• authoring 

• GPU state 
» vo l umc - to - vo l umc interaction 

• Tricky to render. 
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Problem! 

• N-order problem 

• 64 A 3 = ~0.25m cells 

• 128 A 3 = ~2m cells 

• 256 A 3 = ~16m cells 

• ... 

• Applies to: 

• computational complexity 

• memory requirements 
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Texture3D - 4xl6F 



And that's just 1 texture... 
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Bricks 

• Split simulation space into groups of cells (each 
known as a brick). 

• Simulate each brick independently. 





GAME DEVELOPERS CONFERENCE® 2015 


MARCH 2-6, 2015 GDCONF.COM 


Brick Map 

• Need to track which bricks contain fluid 


• Texture3D<uint> 

• 1 voxel per brick 

• O -> Unoccupied 

• 1 Occupied 









o 



S 






■ 









0 



5 











Could also use packed binary grids [Gruenl5], but this 
requires atomics ® 
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Tracking Bricks 


• Initialise with emitter 


• Expansion ( unoccupied -> occupied ) 

• if { V| X |y| Z | > | D brick | } 

• expand in that axis 


t 






* 


• Reduction ( occupied -> unoccupied ) 

• inverse of Expansion 


4 





1 

• handled automatically 


t 
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Sparse Simulation 



Texture3D<ui nt> g_BrickMapRO; 
AppendStructredBuffer<ui nt3> g 

(g_Bri ckMapRO[i dx] != 0) 

{ 

g_Li stRW . Append (i dx) ; 

} 


_Li stRW; 


* Includes expansion 
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Uncompressed Storage 



Occupied 

Unoccupied 


Allocate everything; forget 
about unoccupied cells © 

Pros: 

• simulation is coherent in memory. 

• works in DXll.O. 


Cons: 

• no reduction in memory usage. 
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Compressed Storage 


Indirection Table 








A 

B 

C 



D 


E 



F 

G 

H 








Physical Memory 


Mapped 

Unmapped 


B 


H 


Similar to, List<Brick> 
Pros: 

• good memory consumption. 

• works in DX11.0. 

Cons: 

• allocation strategies. 

• indirect lookup. 

• "software translation" 

• filtering particularly costly 
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Brick = (4) 3 = 64 
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1 Brick = (1+4+1) 3 = 216 

• New problem; 

• "6n 2 +12n + 8" problem. 


Can we do better? 
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Enter; Feature Level 11.3 

• Volume Tiled Resources (VTR)! © 

• Extends 2D functionality in FLU. 2 

. Must check HW support: (DX11.3 != FLU. 3) 


iDSDllDevice pDevice3 = nullptr; 
pDevi ce->#£/ery.T/7terf<3ce(&pDevi ce3) ; 

_feature_data_d3d11_options 2 support ; 
pDevi ce3->C/7ec^eat4/re5i/p/7ort(D3Dll_FEATURE_D3Dll_OPTlONS2 , 

&support , 

si zeof (support)) ; 

m_useTi 1 edResources = support. TiledResourcesTier == 

d3d!1_tiled_resources_tier_3 ; 
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Tiled Resources #1 


Tiled 

Resource 





0- 


1 ' 




2 


3 - 




4 - 

„ c 
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NULL 
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Memory 
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7 
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Pros: 

• only mapped memory is 
allocated in VRAM 

• "hardware translation" 

• logically a volume texture 

• all samplers supported 

• 1 Tile = 64KB (= 1 Brick) 

• fast loads 
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Tiled Resources #2 


Tiled 

Resource 


Tile 

Pool 



Physical 
Memory 
0 


1 

2 

3 

4 

5 

6 
7 


1 Tile = 64KB (= 1 Brick) 


BPP 


Tile Dimensions 


8 

16 

32 

128 


64x32x32 

32x32x32 

32x32x16 


32x16x16 


16x16x16 


Gotcha: Tile mappings must be updated from CPU 
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Latency Resistant Simulation #1 


Naive Approach: 

• clamp velocity to V max 

• CPU Read-back: 

• occupied bricks. 

• 2 frames of latency! 

• extrapolate "probable" tiles. 



CPU 

GPU 


N; N+l; N+2 ; 

Data Ready Data Ready Data Ready 

J 1 L 


Frame N Frame N + l Frame N+2 Frame N+3 


Frame N Frame N + l Frame N+2 


N; Tiles Mapped 
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Latency Resistant Simulation #2 


Tight Approach: 

• CPU Read-back: 

• occupied bricks. 

• max{|V|> within brick. 

• 2 frames of latency! 

• extrapolate "probable" tiles. 



CPU 

GPU 


N; N+l; N+2 ; 

Data Ready Data Ready Data Ready 

l 1 1 



Frame N 

Frame N + l 

Frame N+2 

Frame N+3 



■■■■I 

Frame N 

Frame N + l 

Frame N+2 



N; Tiles Mapped 
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Latency Resistant Simulation #3 



Yes 




Sparse 

Eulerian 

Simulation 


j 
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Demo 


MARCH 2-6, 2015 GDCONF.COM 



Sim. Time (ms) 
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Performance #1 


64.7 



128 256 384 512 1024 

Grid Resolution 


Grid Resolution 



128 3 

256 3 

m 

00 

512 3 

1,024 3 


Full Grid 

Num. Bricks 

256 

2048 

6,912 

16,384 

131,072 

Memory (MB) 

80 

640 

2,160 

5,120 

40,960 

Simulation 

2.29ms 

19.04ms 

64.71ms 

NA 

NA 


Sparse Grid 

Num. Bricks 

36 

146 

183 

266 

443 

Memory (MB) 

11.25 

45.63 

57.19 

83.13 

138.44 

Simulation 

0.41ms 

1.78ms 

2.67ms 

2.94ms 

5.99ms 

Scaling Sim. 

78.14% 

76.46% 

75.01% 

NA 

NA 


NOTE: Numbers captured on a GeForce GTX980 




Memory (MB) 
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Performance #2 



■ Full Grid 

■ Sparse Grid 


128 256 384 512 1024 


Grid Resolution 



128 3 

256 3 

m 

00 

512 3 

1,024 3 


Full Grid 

Num. Bricks 

256 

2048 

6,912 

16,384 

131,072 

Memory (MB) 

80 

640 

2,160 

5,120 

40,960 

Simulation 

2.29ms 

19.04ms 

64.71ms 

NA 

NA 


Sparse Grid 

Num. Bricks 

36 

146 

183 

266 

443 

Memory (MB) 

11.25 

45.63 

57.19 

83.13 

138.44 

Simulation 

0.41ms 

1.78ms 

2.67ms 

2.94ms 

5.99ms 

Scaling Sim. 

78.14% 

76.46% 

75.01% 

NA 

NA 


Grid Resolution 


NOTE: Numbers captured on a GeForce GTX980 
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Scaling 

• Speed ratio (1 Brick) 


Time{Sparse} 

Time{Full> 


• ~75% across grid resolutions. 


Grid Resolution 

128 3 256 3 384 3 512 3 1,024 3 


Scaling Sim. 78.14% 76.46% 75.01% 


NA 


NA 
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Summary 

• Fluid simulation in games is justified. 

• Fluid is not box shaped! 

• One volume is better than many small. 

• Un/Compressed storage a viable fallback. 

• VTRs great for fluid simulation. 





Other latency resistant algorithms with tiled resouces? 
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Questions? 


Alex Dunn - adunn@nvidia.com 
Twitter: @AlexWDunn 


Thanks for attending. 



